Tractable Query Answering Under Probabilistic Constraints

نویسنده

  • Antoine Amarilli
چکیده

Large knowledge bases such as YAGO [SKW07] or DBpedia [BLK+09] can be used to answer queries in various domains. However, as they are automatically harvested from Web sources, they may be incomplete: important facts may be missing because they were not materialized in the original sources, or could not be extracted correctly. To mitigate this problem, approaches such as association rule mining [GTHS13] can extract statistical rules from the data which hold in most situations. For instance, people are usually nationals of the country where they are born; people who died in a place are often buried there. The application of such rules allows us to infer some of the missing facts, which may help mitigate the issue of incompleteness. Hence, we study the problem of query answering on large-scale knowledge bases under the constraints of such probabilistic deduction rules. As such rules only represent statistical tendencies, one needs to keep track of uncertainty on rule consequences when reasoning about them. There is a large body of work on probabilistic data management [SORK11]; yet, in that setting, many important tasks are intractable. For example, fixed conjunctive queries may be #P-hard [DS07] to evaluate on a probabilistic instance, even in the very simple tupleindependent database (TID) model [LLRS97]. To work around such hardness results, existing work has already investigated which query classes are tractable over all data instances, with a complex dichotomy between safe and unsafe queries [DS12]. Yet, there has been no attempt to generalize the observation that query evaluation is tractable, for all queries and for much more expressive query languages, on some instances such as probabilistic XML trees [CKS09]. Our work follows this intuition and revisits the probabilistic inference problem by studying instance classes that ensure tractability. More precisely, we study complexity as a function of instance treewidth, which is motivated by well-known tractability results on evaluating monadic second-order (MSO) queries on non-probabilistic bounded-treewidth instances [FFG02] and counting queries on bounded-treewidth graphs [ALS91]. This approach is also practically relevant, as the treewidth of real-world data is usually much less than its size. We thus show that, for the TID model, MSO query evaluation has linear data complexity if the treewidth of the instance is fixed. The TID model is not sufficient to represent the consequences of uncertain deduction rules, however: it assumes independence of all facts, whereas rule application imposes correlations between cause and consequence facts. Correlations are usually represented by probabilistic events shared between multiple facts, yet their presence makes it generally intractable to evaluate even the simplest queries, both in the relational [GT06] and XML [KS11] setting. However, we show that query evaluation is tractable if the instance has bounded width under a new notion of tree decomposition that accounts for probabilistic events; intuitively, we enforce their compatibility with the tree structure. This result implies, for example, that it is tractable to evaluate queries on the block-independent disjoint [BGMP92] probabilistic relational model, if the underlying instance has bounded treewidth in the usual sense and if the size of blocks is bounded by a constant. In the XML setting, it implies that query evaluation is tractable whenever there are only a bounded number of relevant events to propagate at any point along the tree. We last turn to our original problem of query evaluation on probabilistic instances under uncertain deduction rules: the goal is to determine the answers of a query on a knowledge base, annotated by

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tractable Query Answering over Ontologies with Datalog+/-

We present a family of expressive extensions of Datalog, called Datalog±, as a new paradigm for query answering over ontologies. The Datalog± family admits existentially quantified variables in rule heads, and has suitable restrictions to ensure highly efficient ontology querying. In particular, we show that query answering under so-called guarded Datalog± is PTIME-complete in data complexity, ...

متن کامل

Tractable query answering and rewriting under description logic constraints

Answering queries over an incomplete database w.r.t. a set of constraints is an important computational task with applications in fields as diverse as information integration and metadata management in the semantic Web. Description Logics (DLs) are constraint languages that have been extensively studied with the goal of providing useful modeling constructs while keeping the query answering prob...

متن کامل

Query Answering in Bayesian Description Logics

The Bayesian Description Logic (BDL) BEL is a probabilistic DL, which extends the lightweight DL EL by defining a joint probability distribution over EL axioms with the help of a Bayesian network (BN). In the recent work, extensions of standard logical reasoning tasks in BEL are shown to be reducible to inferences in BNs. This work concentrates on a more general reasoning task, namely on conjun...

متن کامل

Consistency Checking and Querying in Probabilistic Databases under Integrity Constraints

We address the issue of incorporating a particular yet expressive form of integrity constraints (namely, denial constraints) into probabilistic databases. To this aim, we move away from the common way of giving semantics to probabilistic databases, which relies on considering a unique interpretation of the data, and address two fundamental problems: consistency checking and query evaluation. Th...

متن کامل

Tractable Query Answering in Indeenite Constraint Databases: Basic Results and Applications to Querying Spatiotemporal Information ?

We consider the scheme of indeenite constraint databases proposed by Koubarakis. This scheme can be used to represent indefinite information arising in temporal, spatial and truly spatiotemporal applications. The main technical problem that we address in this paper is the discovery of tractable classes of databases and queries in this scheme. We start with the assumption that we have a class of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014